<TeXmacs|2.1>

<style|<tuple|generic|old-dots|old-lengths>>

<\body>
  <section|Preliminary>

  Herein, we discuss some essential materials preparing for the following
  calculation.

  <subsection|Feynman Diagram>

  <\lemma>
    <label|Lemma: Feynman Diagram>[Feynman Diagram]

    Let <math|f<around*|(|t|)>\<in\>C<rsup|\<infty\>><around*|(|\<bbb-R\><rsup|n>,\<bbb-R\><rsub|+>|)>>
    with <math|f<around*|(|0|)>=1>, and define
    <math|m<rsub|\<alpha\><rsub|1>\<cdots\>a<rsub|n>>\<assign\>\<partial\><rsub|\<alpha\><rsub|1>>\<cdots\>\<partial\><rsub|\<alpha\><rsub|n>>f<around*|(|0|)>>.
    Let <math|g<around*|(|t|)>\<assign\>ln f<around*|(|t|)>>, and define
    <math|\<kappa\><rsub|\<alpha\><rsub|1>\<cdots\>a<rsub|n>>\<assign\>\<partial\><rsub|\<alpha\><rsub|1>>\<cdots\>\<partial\><rsub|\<alpha\><rsub|n>>g<around*|(|0|)>>.
    Then,

    <\enumerate-numeric>
      <item>if <math|\<exists\><around*|{|f<rsub|\<alpha\>>\<in\>C<rsup|\<infty\>><around*|(|\<bbb-R\>,\<bbb-R\><rsub|+>|)>\|\<alpha\>=1,\<ldots\>,n|}>>
      s.t. <math|f<around*|(|t|)>=<big|prod><rsub|\<alpha\>=1><rsup|n>f<rsub|\<alpha\>><around*|(|t<rsup|\<alpha\>>|)>>,
      then <math|\<kappa\><rsub|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|m>>\<propto\>\<delta\><rsub|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|m>>>,
      the Kroneker-delta,

      <item>and

      <\equation>
        m<rsub|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|n>>=<text|Feynman
        diagram by <math|\<kappa\><rsub|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|m>>>
        for <math|\<forall\>m\<leqslant\>n>>.
      </equation>
    </enumerate-numeric>
  </lemma>

  For instance,

  <\equation*>
    m<rsub|\<alpha\>\<beta\>>=\<kappa\><rsub|\<alpha\>\<beta\>>+\<kappa\><rsub|\<alpha\>>\<kappa\><rsub|\<beta\>>,
  </equation*>

  <\equation*>
    m<rsub|\<alpha\>\<beta\>\<gamma\>>=\<kappa\><rsub|\<alpha\>\<beta\>\<gamma\>>+\<kappa\><rsub|\<alpha\>>\<kappa\><rsub|\<beta\>\<gamma\>>+\<kappa\><rsub|\<gamma\>>\<kappa\><rsub|\<alpha\>\<beta\>>+\<kappa\><rsub|\<beta\>>\<kappa\><rsub|\<gamma\>\<alpha\>>+\<kappa\><rsub|\<alpha\>>\<kappa\><rsub|\<beta\>>\<kappa\><rsub|\<gamma\>>.
  </equation*>

  <subsection|Momentum & Cumulant>

  <\definition>
    [Momentum & Cumulant]

    Let <math|p> a distribution of a high-dimensional random variable
    <math|X<rsup|a>>, then, given constant vector <math|\<mu\>>, define
    momentum generating function as

    <\equation>
      M<rsub|p><around*|(|t;\<mu\>|)>\<assign\><big|int>\<mathd\>x
      p<around*|(|x|)> \<mathe\><rsup|t<rsub|\<alpha\>><around*|(|x<rsup|\<alpha\>>-\<mu\><rsup|\<alpha\>>|)>>.
    </equation>

    And cumulant generating function as

    <\equation>
      K<rsub|p><around*|(|t;\<mu\>|)>\<assign\>ln
      M<rsub|p><around*|(|t;\<mu\>|)>.
    </equation>

    Then define momentum by

    <\equation>
      m<rsup|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|n>><rsub|p><around*|(|\<mu\>|)>\<assign\><frac|\<partial\><rsup|n>M<rsub|p>|\<partial\>t<rsub|\<alpha\><rsub|1>>\<cdots\>\<partial\>t<rsub|\<alpha\><rsub|n>>><around*|(|0;\<mu\>|)>.
    </equation>

    And cumulant by

    <\equation>
      \<kappa\><rsup|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|n>><rsub|p><around*|(|\<mu\>|)>\<assign\><frac|\<partial\><rsup|n>K<rsub|p>|\<partial\>t<rsub|\<alpha\><rsub|1>>\<cdots\>\<partial\>t<rsub|\<alpha\><rsub|n>>><around*|(|0;\<mu\>|)>.
    </equation>
  </definition>

  <\theorem>
    If components of <math|X<rsup|a>> is independent, i.e.
    <math|p<around*|(|x|)>=<big|prod><rsub|\<alpha\>>p<rsub|\<alpha\>><around*|(|x<rsup|\<alpha\>>|)>>,
    then <math|\<kappa\><rsup|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|n>><rsub|p><around*|(|\<mu\>|)>\<propto\>\<delta\><rsup|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|n>>>,
    the Kroneker-delta.
  </theorem>

  <\theorem>
    [Relation between Momentum and Cumulant]

    <\equation>
      m<rsup|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|n>><rsub|p><around*|(|\<mu\>|)>=<text|Feynman
      diagram by <math|\<kappa\><rsup|\<alpha\><rsub|1>\<cdots\>\<alpha\><rsub|m>><rsub|p><around*|(|\<mu\>|)>>
      for <math|\<forall\>m\<leqslant\>n>>.
    </equation>
  </theorem>

  <\proof>
    Using lemma <reference|Lemma: Feynman Diagram>, set
    <math|f<around*|(|t|)>\<rightarrow\>M<rsub|p><around*|(|t;\<mu\>|)>>,
    then these two theorems can be proved directly.
  </proof>

  For instance,

  <\equation*>
    m<rsup|\<alpha\>\<beta\>><rsub|p><around*|(|\<mu\>|)>=\<kappa\><rsub|p><rsup|\<alpha\>\<beta\>><around*|(|\<mu\>|)>+\<kappa\><rsup|\<alpha\>><rsub|p><around*|(|\<mu\>|)>\<kappa\><rsup|\<beta\>><rsub|p><around*|(|\<mu\>|)>,
  </equation*>

  <\equation*>
    m<rsup|\<alpha\>\<beta\>\<gamma\>><rsub|p><around*|(|\<mu\>|)>=\<kappa\><rsub|p><rsup|\<alpha\>\<beta\>\<gamma\>><around*|(|\<mu\>|)>+\<kappa\><rsup|\<alpha\>><rsub|p><around*|(|\<mu\>|)>\<kappa\><rsup|\<beta\>\<gamma\>><rsub|p><around*|(|\<mu\>|)>+\<kappa\><rsup|\<gamma\>><rsub|p><around*|(|\<mu\>|)>\<kappa\><rsup|\<alpha\>\<beta\>><rsub|p><around*|(|\<mu\>|)>+\<kappa\><rsup|\<gamma\>><rsub|p><around*|(|\<mu\>|)>\<kappa\><rsup|\<gamma\>\<alpha\>><rsub|p><around*|(|\<mu\>|)>+\<kappa\><rsup|\<alpha\>><rsub|p><around*|(|\<mu\>|)>\<kappa\><rsup|\<beta\>><rsub|p><around*|(|\<mu\>|)>\<kappa\><rsup|\<gamma\>><rsub|p><around*|(|\<mu\>|)>.
  </equation*>

  <section|Hebbian Rule>

  Herein, we furnish a mathmatical formulation of the Hebbian rule.

  Hebb claimed that <with|font-shape|italic|neurons that fire together wire
  together>. This is the Hebbian rule. Mathematially, we characterize the
  distribution of random variables <math|X<rsup|a>>s, which can be
  illustrated as the activation of neurons, as <math|p<around*|(|x|)>>, s.t.,
  for any indices <math|\<alpha\>,\<beta\>> with
  <math|\<alpha\>\<neq\>\<beta\>>,

  <\equation*>
    <big|int>\<mathd\>x p<around*|(|x|)> <around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>=<wide|C|^><rsup|\<alpha\>\<beta\>>,
  </equation*>

  where <math|<wide|x|^><rsup|a>> denotes the mean
  <math|E<around*|(|X<rsup|a>|)>> and <math|<wide|C|^><rsup|ab>> the
  co-variance <math|Cov<around*|(|X<rsup|a>,X<rsup|b>|)>>, based on the
  empirical distribution in the real world.

  Given <math|<wide|x|^><rsup|a>> and <math|<wide|C|^><rsup|ab>>, define
  distribution <math|q<around*|(|x|)>\<assign\><big|prod><rsub|\<alpha\>>q<rsub|\<alpha\>><around*|(|x<rsup|\<alpha\>>|)>>
  with <math|<big|int>\<mathd\>x q<rsub|\<alpha\>><around*|(|x<rsup|\<alpha\>>|)>
  x<rsup|\<alpha\>>=<wide|x|^><rsup|\<alpha\>>.> We are considering the
  minimal extension of <math|q> s.t. this new distribution, say <math|p>, can
  satisty the Hebbian rule given above. The words minimal extension means
  that the KL-divergence <math|D<rsub|KL><around*|(|q,p|)>> is minimized. So,
  mathematically, the previous declarations can be summarized as the
  stabilization of Lagrangian

  <\align>
    <tformat|<table|<row|<cell|L<around*|(|p|)>>|<cell|=D<rsub|KL><around*|(|q,p|)>>>|<row|<cell|>|<cell|+<big|sum><rsub|\<alpha\>,\<beta\>,\<alpha\>\<neq\>\<beta\>>\<lambda\><rsub|\<alpha\>\<beta\>>
    <big|int>\<mathd\>x p<around*|(|x|)> <around*|[|<around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>-<wide|C|^><rsup|\<alpha\>\<beta\>>|]>>>|<row|<cell|>|<cell|+\<mu\><around*|(|<big|int>\<mathd\>x
    p<around*|(|x|)>-1|)>,>>>>
  </align>

  where the third line indicates that <math|p<around*|(|x|)>> is normalized
  s.t. it's a distribution.

  <\theorem>
    [Hebbian Rule]

    Define

    <\equation>
      E<around*|(|x|)>\<assign\>-<frac|1|2><big|sum><rsub|\<alpha\>,\<beta\>,\<alpha\>\<neq\>\<beta\>>W<rsub|\<alpha\>\<beta\>>
      <around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>,
    </equation>

    where the matrix <math|W> is symmetric, and

    <\equation>
      ln p<around*|(|x|)>\<assign\> ln q<around*|(|x|)>-E<around*|(|x|)>+ln
      Z,
    </equation>

    where <math|Z\<assign\><big|int>\<mathd\>x q<around*|(|x|)>
    exp<around*|(|-E<around*|(|x|)>|)>> is the normalization constant. Then,
    stablizing the Lagrangian <math|L<around*|(|p|)>> is equivalent to solve
    the <math|W> in <math|E<around*|(|x|)>> s.t. for
    <math|\<forall\>\<alpha\>,\<beta\>> with <math|\<alpha\>\<neq\>\<beta\>>,

    <\equation>
      <big|int>\<mathd\>x p<around*|(|x|)>
      <around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>=<wide|C|^><rsup|\<alpha\>\<beta\>>.
    </equation>
  </theorem>

  <section|Perturbation>

  <subsection|First Order Perturbative Solution>

  Herein, we solve the equations in the perturbation framework.

  For expansion by <math|\<epsilon\>>, we re-formulate
  <math|E<around*|(|x|)>\<rightarrow\>\<beta\>E<around*|(|x|)>>, where
  constant <math|\<beta\>\<ll\>1>, indicating the <math|\<epsilon\>>. Thus

  <\equation>
    ln p<around*|(|x|)>\<assign\> ln q<around*|(|x|)>-\<beta\>E<around*|(|x|)>+ln
    Z,
  </equation>

  and <math|Z\<rightarrow\><big|int>\<mathd\>x q<around*|(|x|)>
  exp<around*|(|-\<beta\>E<around*|(|x|)>|)>>.

  <\lemma>
    [First Order Perturbation (Part 1)]

    Let <math|Z\<backassign\>Z<rsub|0>+\<beta\>
    Z<rsub|1>+\<beta\><rsup|2>Z<rsub|2>/2!+\<cdots\>> and
    <math|p<around*|(|x|)>\<backassign\>p<rsub|0><around*|(|x|)>+\<beta\>p<rsub|1><around*|(|x|)>+\<beta\><rsup|2>p<rsub|2><around*|(|x|)>/2!+\<cdots\>>.
    We have

    <\equation>
      Z<rsub|0>=1,
    </equation>

    <\equation>
      Z<rsub|1>=<frac|1|2><big|sum><rsub|\<alpha\>,\<beta\>,\<alpha\>\<neq\>\<beta\>>W<rsub|\<alpha\>\<beta\>>
      m<rsub|q><rsup|\<alpha\>\<beta\>><around*|(|<wide|x|^>|)>,
    </equation>

    and

    <\equation>
      p<rsub|0><around*|(|x|)>=q<around*|(|x|)>,
    </equation>

    <\equation>
      p<rsub|1><around*|(|x|)>=q<around*|(|x|)>\<times\><frac|1|2><big|sum><rsub|\<alpha\>,\<beta\>,\<alpha\>\<neq\>\<beta\>>W<rsub|\<alpha\>\<beta\>><around*|[|<around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>-
      m<rsub|q><rsup|\<alpha\>\<beta\>><around*|(|<wide|x|^>|)>|]>.
    </equation>
  </lemma>

  <\lemma>
    [First Order Perturbation (Part 2)]

    We have, for <math|\<forall\>\<alpha\>>,

    <\equation>
      m<rsub|p><rsup|\<alpha\>><around*|(|<wide|x|^>|)>=<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>,
    </equation>

    and for <math|\<forall\>\<alpha\>,\<beta\>> with
    <math|\<alpha\>\<neq\>\<beta\>>,

    <\equation>
      <wide|C|^><rsup|\<alpha\>\<beta\>>=\<beta\> W<rsub|\<alpha\>\<beta\>>
      m<rsup|\<alpha\>\<alpha\>><rsub|q><around*|(|<wide|x|^>|)>
      m<rsup|\<beta\>\<beta\>><rsub|q><around*|(|<wide|x|^>|)>+<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>.
    </equation>
  </lemma>

  <\proof>
    Briefly, we abbreviate the notations

    <\align>
      <tformat|<table|<row|<cell|W<rsub|\<alpha\>\<alpha\>>>|<cell|\<rightarrow\>0>>|<row|<cell|<big|sum><rsub|\<alpha\>,\<beta\>,\<alpha\>\<neq\>\<beta\>>>|<cell|\<rightarrow\>>>|<row|<cell|m<rsub|q><rsup|\<alpha\>\<cdots\>><around*|(|<wide|x|^>|)>>|<cell|\<rightarrow\>m<rsup|\<alpha\>\<cdots\>>>>|<row|<cell|\<kappa\><rsub|q><rsup|\<alpha\>\<cdots\>><around*|(|<wide|x|^>|)>>|<cell|\<rightarrow\>\<kappa\><rsup|\<alpha\>\<cdots\>>>>>>
    </align>

    Directly, for <math|\<forall\>\<alpha\>>,

    <\align>
      <tformat|<table|<row|<cell|m<rsub|p><rsup|\<alpha\>><around*|(|<wide|x|^>|)>\<assign\>>|<cell|<big|int>\<mathd\>x
      p<around*|(|x|)><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)>>>|<row|<cell|<around*|{|p<rsub|0>,p<rsub|1>=\<cdots\>|}>=>|<cell|<big|int>\<mathd\>x
      q<around*|(|x|)><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)>>>|<row|<cell|+>|<cell|\<beta\><big|int>\<mathd\>x
      q<around*|(|x|)><frac|1|2>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>>
      <around*|[|<around*|(|x<rsup|\<alpha\><rprime|'>>-<wide|x|^><rsup|\<alpha\><rprime|'>>|)><around*|(|x<rsup|\<beta\><rprime|'>>-<wide|x|^><rsup|\<beta\><rprime|'>>|)>-
      m<rsup|\<alpha\><rprime|'>\<beta\><rprime|'>>|]><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|m<rsup|\<alpha\>\<cdots\>>\<assign\>\<cdots\>|}>=>|<cell|m<rsup|\<alpha\>>>>|<row|<cell|+>|<cell|\<beta\><frac|1|2>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>><around*|(|m<rsup|\<alpha\><rprime|'>\<beta\><rprime|'>\<alpha\>>-m<rsup|\<alpha\><rprime|'>\<beta\><rprime|'>>m<rsup|\<alpha\>>|)>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|Feynman
      diagram|}>=>|<cell|\<kappa\><rsup|\<alpha\>>>>|<row|<cell|+>|<cell|\<beta\><frac|1|2>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>><around*|(|<text|non-diagonal>+\<kappa\><rsup|a>\<cdots\>|)>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|\<kappa\><rsup|\<alpha\>\<cdots\>>
      is diagonal,\<kappa\><rsup|\<alpha\>>=0|}>=>|<cell|0>>|<row|<cell|+>|<cell|0>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>.>>>>
    </align>

    Directly, for <math|\<forall\>\<alpha\>,\<beta\>>,

    <\align>
      <tformat|<table|<row|<cell|m<rsup|\<alpha\>\<beta\>><rsub|p><around*|(|<wide|x|^>|)>\<assign\>>|<cell|<big|int>\<mathd\>x
      p<around*|(|x|)><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>>>|<row|<cell|<around*|{|p<rsub|0>,p<rsub|1>=\<cdots\>|}>=>|<cell|<big|int>\<mathd\>x
      q<around*|(|x|)><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>>>|<row|<cell|+>|<cell|\<beta\><big|int>\<mathd\>x
      q<around*|(|x|)><frac|1|2>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>>
      <around*|[|<around*|(|x<rsup|\<alpha\><rprime|'>>-<wide|x|^><rsup|\<alpha\><rprime|'>>|)><around*|(|x<rsup|\<beta\><rprime|'>>-<wide|x|^><rsup|\<beta\><rprime|'>>|)>-
      m<rsup|\<alpha\><rprime|'>\<beta\><rprime|'>>|]><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|m<rsup|\<alpha\>\<cdots\>>\<assign\>\<cdots\>|}>=>|<cell|m<rsup|\<alpha\>\<beta\>>>>|<row|<cell|+>|<cell|\<beta\><frac|1|2>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>><around*|(|m<rsup|\<alpha\><rprime|'>\<beta\><rprime|'>\<alpha\>\<beta\>>-m<rsup|\<alpha\><rprime|'>\<beta\><rprime|'>>m<rsup|\<alpha\>\<beta\>>|)>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|Feynman
      diagram|}>=>|<cell|\<kappa\><rsup|\<alpha\>\<beta\>>+\<kappa\><rsup|\<alpha\>>\<kappa\><rsup|\<beta\>>>>|<row|<cell|+>|<cell|\<beta\>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>>\<kappa\><rsup|\<alpha\><rprime|'>\<alpha\>>\<kappa\><rsup|\<beta\><rprime|'>\<beta\>>+\<beta\>W<rsub|\<alpha\><rprime|'>\<beta\><rprime|'>><around*|(|\<kappa\><rsup|\<alpha\><rprime|'>>\<kappa\><rsup|\<alpha\>>\<kappa\><rsup|\<beta\><rprime|'>\<beta\>>+\<kappa\><rsup|\<alpha\><rprime|'>\<alpha\>>\<kappa\><rsup|\<beta\><rprime|'>>\<kappa\><rsup|\<beta\>>|)>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|\<kappa\><rsup|\<alpha\>\<cdots\>>
      is diagonal,\<kappa\><rsup|\<alpha\>>=0|}>=>|<cell|\<kappa\><rsup|\<alpha\>\<alpha\>>\<delta\><rsup|\<alpha\>\<beta\>>>>|<row|<cell|+>|<cell|\<beta\>W<rsub|\<alpha\>\<beta\>>\<kappa\><rsup|\<alpha\>\<alpha\>>\<kappa\><rsup|\<beta\>\<beta\>>>>|<row|<cell|+>|<cell|<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>.>>>>
    </align>

    Thus, for <math|\<forall\>\<alpha\>>,

    <\align>
      <tformat|<table|<row|<cell|m<rsub|p><rsup|\<alpha\>\<alpha\>><around*|(|<wide|x|^>|)>:=>|<cell|<big|int>\<mathd\>x
      p<around*|(|x|)><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><rsup|2>>>|<row|<cell|=>|<cell|\<kappa\><rsup|\<alpha\>\<alpha\>>+<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|\<kappa\><rsup|\<alpha\>>=0|}>=>|<cell|m<rsup|\<alpha\>\<alpha\>>+<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>,>>>>
    </align>

    and for <math|\<forall\>\<alpha\>,\<beta\>> with
    <math|\<alpha\>\<neq\>\<beta\>>, from the equations

    <\align>
      <tformat|<table|<row|<cell|<wide|C|^><rsup|\<alpha\>\<beta\>>=>|<cell|<big|int>\<mathd\>x
      p<around*|(|x|)><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)><around*|(|x<rsup|\<beta\>>-<wide|x|^><rsup|\<beta\>>|)>>>|<row|<cell|=>|<cell|\<beta\>W<rsub|\<alpha\>\<beta\>>\<kappa\><rsup|\<alpha\>\<alpha\>>\<kappa\><rsup|\<beta\>\<beta\>>+<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>>>|<row|<cell|<around*|{|\<kappa\><rsup|\<alpha\>>=0|}>=>|<cell|\<beta\>W<rsub|\<alpha\>\<beta\>>m<rsup|\<alpha\>\<alpha\>>m<rsup|\<beta\>\<beta\>>+<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>.>>>>
    </align>

    \;
  </proof>

  <\theorem>
    [First Order Perturbation Solution]

    Let <math|<wide|\<sigma\>|^><rsup|\<alpha\>>> the standard derivative of
    <math|q<rsub|\<alpha\>>>, i.e. <math|<wide|\<sigma\>|^><rsup|\<alpha\>>\<assign\><sqrt|m<rsup|\<alpha\>\<alpha\>><rsub|q><around*|(|<wide|x|^>|)>>>.
    Define standarizing function <math|z<rsup|\<alpha\>><around*|(|x<rsup|\<alpha\>>|)>\<assign\><around*|(|x<rsup|\<alpha\>>-<wide|x|^><rsup|\<alpha\>>|)>/<wide|\<sigma\>|^><rsup|\<alpha\>>>.
    Let <math|<wide|\<rho\>|^><rsub|ab>\<assign\><wide|C|^><rsup|\<alpha\>\<beta\>>/<around*|(|<wide|\<sigma\>|^><rsup|\<alpha\>><wide|\<sigma\>|^><rsup|\<beta\>>|)>>.
    The solution of <math|E> is

    <\equation*>
      \<beta\>E<around*|(|x|)>=-<frac|1|2><wide|\<rho\>|^><rsub|\<alpha\>\<beta\>>
      z<rsup|\<alpha\>><around*|(|x<rsup|\<alpha\>>|)>
      z<rsup|\<beta\>><around*|(|x<rsup|\<beta\>>|)>+<with|math-font|cal|O><around*|(|\<beta\><rsup|2>|)>.
    </equation*>

    Notice that this formulation is invariant under re-scaling and shift
    transformations, i.e. <math|x<rsup|\<alpha\>>\<rightarrow\>a<rsup|\<alpha\>>
    x<rsup|\<alpha\>>+b<rsup|\<alpha\>>>, where <math|a> and <math|b> are
    constant vectors.
  </theorem>

  <subsection|Validation of the Perturbative Solution>

  Herein we introduce a \Pzoom-in\Q trick, such that the real word data is
  de-correlated, even though they are highly correlated originally, so that
  the perturbation solution is valid for them.

  <\lemma>
    [De-correlation]

    For <math|\<forall\>i=1,\<ldots\>,n> given, let r.v.s
    <math|<around*|{|X<rsub|i\<alpha\>>\|\<alpha\>=1,\<ldots\>,m<rsub|i>|}>>
    are i.i.d. with respect to index <math|\<alpha\>>. Define

    <\equation>
      Y<rsub|i>\<assign\>a<rsub|i> <big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>X<rsub|i\<alpha\>>+b<rsub|i>,
    </equation>

    where <math|a> and <math|b> are constants. Then, we have, for
    <math|\<forall\>i> with arbitrary <math|\<alpha\>=1,\<ldots\>,m<rsub|i>>,

    <\equation>
      C<around*|(|Y<rsub|i>,Y<rsub|i>|)>=a<rsub|i><rsup|2> m<rsub|i>
      C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<alpha\>>|)>,
    </equation>

    and for <math|\<forall\>i,j> with <math|i\<neq\>j> with arbitrary
    <math|\<alpha\>=1,\<ldots\>,m<rsub|i>> and
    <math|\<beta\>=1,\<ldots\>,m<rsub|j>>,

    <\equation>
      C<around*|(|Y<rsub|i>,Y<rsub|j>|)>=a<rsub|i> a<rsub|j> m<rsub|i>
      m<rsub|j> C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>,
    </equation>

    and thus,

    <\equation>
      \<rho\><around*|(|Y<rsub|i>,Y<rsub|j>|)>=<sqrt|m<rsub|i> m<rsub|j>>
      \<rho\><around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>.
    </equation>
  </lemma>

  <\proof>
    Directly, since <math|C> is bilinear function, with shifting symmety, we
    have

    <\align>
      <tformat|<table|<row|<cell|C<around*|(|Y<rsub|i>,Y<rsub|j>|)>>|<cell|=C<around*|(|a<rsub|i>
      <big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>X<rsub|i\<alpha\>>+b<rsub|i>,a<rsub|j>
      <big|sum><rsub|\<beta\>><rsup|m<rsub|j>>X<rsub|j\<beta\>>+b<rsub|j>|)>>>|<row|<cell|<around*|{|Shifting
      symmetry|}>>|<cell|=C<around*|(|a<rsub|i>
      <big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>X<rsub|i\<alpha\>>,a<rsub|j>
      <big|sum><rsub|\<beta\>><rsup|m<rsub|j>>X<rsub|j\<beta\>>|)>>>|<row|<cell|<around*|{|Bilinear|}>>|<cell|=a<rsub|i>
      a<rsub|j><big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>
      <big|sum><rsub|\<beta\>><rsup|m<rsub|j>>C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>.>>>>
    </align>

    For <math|\<forall\>i=j>, since <math|X<rsub|i\<alpha\>>> and
    <math|X<rsub|i\<beta\>>> are i.i.d., that is,
    <math|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<beta\>>|)>\<propto\>\<delta\><rsub|\<alpha\>\<beta\>>>
    and <math|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<alpha\>>|)>\<equiv\>C<around*|(|X<rsub|i\<beta\>>,X<rsub|i\<beta\>>|)>>
    for <math|\<forall\>\<alpha\>,\<beta\>>, we have

    <\align>
      <tformat|<table|<row|<cell|C<around*|(|Y<rsub|i>,Y<rsub|i>|)>>|<cell|=a<rsub|i><rsup|2><big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>
      <big|sum><rsub|\<beta\>><rsup|m<rsub|i>>C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<beta\>>|)>>>|<row|<cell|<around*|{|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<beta\>>|)>\<propto\>\<delta\><rsub|\<alpha\>\<beta\>>|}>>|<cell|=a<rsub|i><rsup|2><big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>
      <big|sum><rsub|\<beta\>><rsup|m<rsub|i>>C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<beta\>>|)>\<delta\><rsub|\<alpha\>\<beta\>>>>|<row|<cell|<around*|{|\<forall\>\<alpha\>,\<beta\>,C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<alpha\>>|)>\<equiv\>C<around*|(|X<rsub|i\<beta\>>,X<rsub|i\<beta\>>|)>|}>>|<cell|=a<rsub|i><rsup|2>
      m<rsub|i> C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<alpha\>>|)>.>>>>
    </align>

    For <math|\<forall\>i\<neq\>j>, <math|X<rsub|i\<alpha\>>> and
    <math|X<rsub|j\<beta\>>> are not explicitly independent, i.e.
    <math|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>> may not
    vanish. But, since <math|X<rsub|i\<alpha\>>> for <math|\<alpha\>> are
    i.i.d., and the same for <math|X<rsub|j\<beta\>>>, we still have
    <math|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>\<equiv\>C<around*|(|X<rsub|i\<alpha\><rprime|'>>,X<rsub|j\<beta\><rprime|'>>|)>>
    for <math|\<forall\><around*|(|\<alpha\>,\<alpha\><rprime|'>|)>,<around*|(|\<beta\>,\<beta\><rprime|'>|)>>.
    Thus, we have,

    <\align>
      <tformat|<table|<row|<cell|C<around*|(|Y<rsub|i>,Y<rsub|j>|)>>|<cell|=a<rsub|i>
      a<rsub|j><big|sum><rsub|\<alpha\>><rsup|m<rsub|i>>
      <big|sum><rsub|\<beta\>><rsup|m<rsub|j>>C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>>>|<row|<cell|<around*|{|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>\<equiv\>C<around*|(|X<rsub|i\<alpha\><rprime|'>>,X<rsub|j\<beta\><rprime|'>>|)>|}>>|<cell|=a<rsub|i>
      a<rsub|j>m<rsub|i> m<rsub|j>C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>.>>>>
    </align>

    Thus, by definition of Pearson correlation coefficients,

    <\align>
      <tformat|<table|<row|<cell|\<rho\><around*|(|Y<rsub|i>,Y<rsub|j>|)>>|<cell|=<frac|C<around*|(|Y<rsub|i>,Y<rsub|j>|)>|<sqrt|C<around*|(|Y<rsub|i>,Y<rsub|i>|)>
      C<around*|(|Y<rsub|j>,Y<rsub|j>|)>>>>>|<row|<cell|<around*|{|Plugin
      previous|}>>|<cell|=<frac|a<rsub|i> a<rsub|j>m<rsub|i>
      m<rsub|j>C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>|<sqrt|a<rsub|i><rsup|2>
      m<rsub|i> C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<alpha\>>|)>
      a<rsub|j><rsup|2> m<rsub|j> C<around*|(|X<rsub|j\<alpha\>>,X<rsub|j\<alpha\>>|)>>>>>|<row|<cell|<around*|{|Arithmetic|}>>|<cell|=<sqrt|m<rsub|i>
      m<rsub|j>> <frac|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>|<sqrt|C<around*|(|X<rsub|i\<alpha\>>,X<rsub|i\<alpha\>>|)>
      C<around*|(|X<rsub|j\<alpha\>>,X<rsub|j\<alpha\>>|)>>>>>|<row|<cell|<around*|{|Definition
      of \<rho\>|}>>|<cell|=<sqrt|m<rsub|i> m<rsub|j>>
      \<rho\><around*|(|X<rsub|i\<alpha\>>,X<rsub|j\<beta\>>|)>.>>>>
    </align>

    Now, we have obtained all the relations.
  </proof>

  Next, we declare how to construct the replication <math|X<rsub|\<alpha\>>>
  from a single <math|Y>.

  <\lemma>
    [Zoom-in Trick]

    Given a random variable <math|Y>, and a conditional distribution
    <math|P<around*|(|X\|Y|)>> with <math|\<bbb-E\><rsub|P<around*|(|X\|Y=y|)>><around*|[|X|]>=y>.
    Let <math|<around*|{|X<rsub|\<mu\>>\|\<mu\>=1,\<ldots\>,m|}>> a set of
    i.i.d. random variables obeying <math|P<around*|(|X|)>>, and define

    <\equation*>
      Z<rsup|<around*|(|m|)>>\<assign\><frac|1|m><big|sum><rsub|\<mu\>=1><rsup|m<rsub|>>X<rsub|\<mu\>>.
    </equation*>

    Then, we have,

    <\equation*>
      lim<rsub|m\<rightarrow\>+\<infty\>>Z<rsup|<around*|(|m|)>>=Y.
    </equation*>
  </lemma>

  <\proof>
    Sample <math|<around*|{|y<rsub|i><mid|\|>i=1,\<ldots\>|}>> from
    <math|P<around*|(|Y|)>>. Then,

    <\equation*>
      P<around*|(|X|)>\<assign\><big|int>\<mathd\>y
      P<around*|(|X\|Y=y|)>P<around*|(|Y=y|)>
    </equation*>

    means that sample <math|x<rsub|i\<mu\>>> from each <math|y<rsub|i>> by
    <math|x<rsub|i\<mu\>>\<sim\>P<around*|(|X\|Y=y<rsub|i>|)>>. And then,

    <\equation*>
      z<rsub|i><rsup|<around*|(|m|)>>\<assign\><frac|1|m><big|sum><rsub|\<mu\>=1><rsup|m>x<rsub|i\<mu\>>.
    </equation*>

    By center limit theorem, since <math|X<rsub|\<mu\>>>s are i.i.d., we have
    <math|z<rsub|i><rsup|<around*|(|m|)>>\<rightarrow\>y<rsub|i>> as
    <math|m\<rightarrow\>+\<infty\>>. That is to say,
    <math|Z<rsup|<around*|(|m|)>>\<rightarrow\>Y>.
  </proof>

  Thus, given a set of real word data <math|<around*|{|y<rsub|i><rsup|\<alpha\>>\|i=1,\<ldots\>D,\<alpha\>=1,\<ldots\>,n|}>>,
  for each <math|y<rsup|\<alpha\>><rsub|i>>, replicate it by sampling from a
  given distribution for which the mean value is
  <math|y<rsup|\<alpha\>><rsub|i>>, as <math|x<rsup|\<alpha\>\<beta\>><rsub|i>>
  with <math|\<beta\>=1,\<ldots\>,m<rsub|\<alpha\>>>. Based on the first
  lemma, the Pearson correlation coefficients of <math|X<rsup|a b>> is much
  smaller than those of <math|Y<rsup|a>>. So, the perturbation solution can
  be established on the <math|x<rsup|\<alpha\>\<beta\>><rsub|i>>s. While
  inference by activation, from the relaxed
  <math|<wide|x|~><rsup|\<alpha\>\<beta\>><rsub|i>>, we can go back to the
  relaxed <math|<wide|y|~><rsup|\<alpha\>><rsub|i>> by
  <math|<wide|z|~><rsub|i><rsup|\<alpha\>>\<assign\><around*|(|1/m<rsub|\<alpha\>>|)><big|sum><rsub|\<beta\>=1><rsup|m<rsub|\<alpha\>>><wide|x|~><rsub|i><rsup|\<alpha\>\<beta\>>>.
  Based on the second lemma, <math|<wide|z|~><rsup|\<alpha\>><rsub|i>>
  approximates to <math|<wide|y|~><rsup|\<alpha\>><rsub|i>> as
  <math|m<rsub|\<alpha\>>> being large enough.

  <section|Renormalization Group: a Probabilitic Perspective>

  In this section, we discuss a general approach to renormalization group
  based on probabilitic description.

  <subsection|Example: Binary Boltzmann Machine>

  <\definition>
    [Binary Boltzmann Machine]

    Let <math|v\<in\><around*|{|-1,1|}><rsup|n>,h\<in\><around*|{|-1,1|}><rsup|m>>,

    <\equation*>
      E<around*|(|v,h|)>=-<frac|1|2>J<rsub|\<alpha\>\<beta\>>
      v<rsup|\<alpha\>> v<rsup|\<beta\>>-a<rsub|\<alpha\>>
      v<rsup|\<alpha\>>-<frac|1|2> L<rsub|i j> h<rsup|i> h<rsup|j>-b<rsub|i>
      h<rsup|i>-U<rsub|\<alpha\> i> v<rsup|\<alpha\>> h<rsup|i>,
    </equation*>

    where <math|J<rsub|\<alpha\>\<alpha\>>\<equiv\>L<rsub|i i>\<equiv\>0>.
  </definition>

  <\align>
    <tformat|<table|<row|<cell|E<around*|(|v|)>=>|<cell|-ln<around*|[|<around*|(|<big|prod><rsub|i=1><rsup|m><big|sum><rsub|h<rsup|i>=\<pm\>1>|)>exp<around*|(|-E<around*|(|v,h|)>|)>|]>>>|<row|<cell|=>|<cell|-<frac|1|2>J<rsub|\<alpha\>\<beta\>>
    v<rsup|\<alpha\>> v<rsup|\<beta\>>-a<rsub|\<alpha\>>
    v<rsup|\<alpha\>>-ln<around*|[|<around*|(|<big|prod><rsub|i=1><rsup|m><big|sum><rsub|h<rsup|i>=\<pm\>1>|)>exp<around*|(|-<frac|1|2>
    L<rsub|i j> h<rsup|i> h<rsup|j>-b<rsub|i> h<rsup|i>-U<rsub|\<alpha\> i>
    v<rsup|\<alpha\>> h<rsup|i>|)>|]>>>|<row|<cell|=>|<cell|-<frac|1|2>J<rsub|\<alpha\>\<beta\>>
    v<rsup|\<alpha\>> v<rsup|\<beta\>>-a<rsub|\<alpha\>>
    v<rsup|\<alpha\>>-ln<around*|[|<around*|(|<big|prod><rsub|i=1><rsup|m><big|sum><rsub|h<rsup|i>=\<pm\>1>|)>q<around*|(|h|)>
    exp<around*|(|-U<rsub|\<alpha\> i> v<rsup|\<alpha\>>
    h<rsup|i>|)>|]>>>|<row|<cell|=>|<cell|-<frac|1|2>J<rsub|\<alpha\>\<beta\>>
    v<rsup|\<alpha\>> v<rsup|\<beta\>>-a<rsub|\<alpha\>>
    v<rsup|\<alpha\>>-K<rsub|q><around*|(|-U<rsub|\<alpha\> i>
    v<rsup|\<alpha\>>,0|)>.>>>>
  </align>

  where we defined <math|q<rsub|><around*|(|h|)>\<propto\>exp<around*|(|-<frac|1|2>
  L<rsub|i j> h<rsup|i> h<rsup|j>-b<rsub|i> h<rsup|i>|)>> and recgonized the
  cumulant generation function <math|K<rsub|q><around*|(|-U<rsub|\<alpha\> i>
  v<rsup|\<alpha\>>,0|)>>. Then, by cumulant expansion TODO

  \;
</body>

<initial|<\collection>
</collection>>

<\references>
  <\collection>
    <associate|Feynman Diagram|<tuple|1|?>>
    <associate|Lemma: Feynman Diagram|<tuple|1|?>>
    <associate|auto-1|<tuple|1|?>>
    <associate|auto-2|<tuple|1.1|?>>
    <associate|auto-3|<tuple|1.2|?>>
    <associate|auto-4|<tuple|2|?>>
    <associate|auto-5|<tuple|3|?>>
    <associate|auto-6|<tuple|3.1|?>>
    <associate|auto-7|<tuple|3.2|?>>
    <associate|auto-8|<tuple|4|?>>
    <associate|auto-9|<tuple|4.1|?>>
  </collection>
</references>

<\auxiliary>
  <\collection>
    <\associate|toc>
      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|1<space|2spc>Momentum
      & Cumulant> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-1><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|2<space|2spc>Hebbian
      Rule> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-2><vspace|0.5fn>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|3<space|2spc>Perturbation>
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-3><vspace|0.5fn>

      <with|par-left|<quote|1tab>|3.1<space|2spc>First Order Perturbative
      Solution <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-4>>

      <with|par-left|<quote|1tab>|3.2<space|2spc>Validation of the
      Perturbative Solution <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-5>>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|4<space|2spc>Renormalization
      Group: a Probabilitic Perspective> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-6><vspace|0.5fn>

      <with|par-left|<quote|1tab>|4.1<space|2spc>Example: Binary Boltzmann
      Machine <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-7>>
    </associate>
  </collection>
</auxiliary>